Markov Chains with the markovchain package

Author of the package: Giorgio Alfredo Spedicato and all.

library("markovchain")
library("expm")

Defining a Markov chain

A discrete time Markov chain (DTMC) is a sequence of identical distributed random variables \(X_1,\,X_2,\,\ldots,\,X_n,\,\ldots\) characterized by the Markov property:

\[ p(X_{n+1} = x_{n+1} \,|\, X_1 = x_1,\, X_2 = x_2,\, \ldots,\, X_n = x_n) = p(X_{n+1} = x_{n+1} \,|\, X_n = x_n). \]

The chain moves from one state to another (this change is named either ‘transition’ or ‘step’) and the probability \(p_{ij}\) to move from state \(s_i\) to state \(s_j\) in one step is named transition probability:

\[ p_{ij} = p(X_1 = s_j \,|\, X_0 = s_i ). \]

The probability of moving from state \(i\) to \(j\) in \(n\) steps is denoted by

\[ p^{(n)}_{ij} = p(X_n = s_j \,|\, X_0 = s_i). \]

The set of possible states \(S = \{s_1,\,s_2,\,\ldots,\,s_r\}\) of \(X_n\) can be finite or countable and it is named the state space of the chain.

The probability distribution of transitions from one state to another can be represented into a transition matrix \(P = \left[p_{ij}\right]_{i,j}\) , where each element of position \((i, j)\) represents the transition probability \(p_{ij}\).

Let’s define a first Markov chain as an object of the class “markovchain”:

library("markovchain")

weatherStates <- c("sunny", "cloudy", "rain")
byRow <- TRUE
weatherMatrix <- matrix(data =
c(0.70, 0.2, 0.1,
0.3, 0.4, 0.3,
0.2, 0.45, 0.35),
byrow = byRow, nrow = 3,
dimnames = list(weatherStates, weatherStates))

mcWeather <- new("markovchain", states = weatherStates, byrow = byRow,
                 transitionMatrix = weatherMatrix, name = "Weather")

Handling markovchain objects

For example, using the previously defined matrix we can find what is the probability distribution of expected weather states in two and seven days, given the actual state to be cloudy.

initialState <- c(0, 1, 0)
after2Days <- initialState * (mcWeather ^ 2)
after7Days <- initialState * (mcWeather ^ 7)
after2Days

##      sunny cloudy  rain
## [1,]  0.39  0.355 0.255

Initial state vector times squared matrix should yield the same result as above:

initialState %*% (weatherMatrix %^% 2)

##      sunny cloudy  rain
## [1,]  0.39  0.355 0.255

Rounding probabilities:

round(after7Days, 3)

##      sunny cloudy  rain
## [1,] 0.462  0.319 0.219

A similar result could have been obtained defining the vector of probabilities as a column vector instead of a row vector. A column - defined probability matrix could be set up either creating a new matrix or transposing an existing markovchain object thanks to the t method.

initialState <- c(0, 1, 0)
after2Days <- (t(mcWeather) * t(mcWeather)) * initialState
after7Days <- (t(mcWeather) ^ 7) * initialState
after2Days

##         [,1]
## sunny  0.390
## cloudy 0.355
## rain   0.255

states(mcWeather)

## [1] "sunny"  "cloudy" "rain"

dim(mcWeather)

## [1] 3

A direct access to transition probabilities is provided both by transitionProbability method and “[” method.

transitionProbability(mcWeather, "cloudy", "rain")

## [1] 0.3

mcWeather[2,3]

## [1] 0.3

To get information about a markovchain object:

show(mcWeather)

## Weather 
##  A  3 - dimensional discrete Markov Chain defined by the following states: 
##  sunny, cloudy, rain 
##  The transition matrix  (by rows)  is defined as follows: 
##        sunny cloudy rain
## sunny    0.7   0.20 0.10
## cloudy   0.3   0.40 0.30
## rain     0.2   0.45 0.35

Defining graphs in order to print a Markov chain

Markovchains are special graphs and as graphs they can be printed by a tool which prints graphs of all kinds. That tool is the R package DiagrammeR.
Let’s begin to compare the basic print available for a markovchain object with the improved DiagrammeR print :

plot(mcWeather)

Using the package DiagrammeR:

library(DiagrammeR)

grViz("
  digraph {
    node [shape = box]
    sunny -> cloudy [label = 0.2]
    sunny -> rain [label = 0.1]
    cloudy -> sunny [label = 0.3]
    cloudy -> rain [label = 0.3]
    rain -> sunny [label = 0.2]
    rain -> cloudy [label = 0.45]
  }
")

Import and export from the class markovchain to the class data.frame and vice versa is possible :

mcDf <- as(mcWeather, "data.frame")
mcNew <- as(mcDf, "markovchain")
mcDf

##       t0     t1 prob
## 1  sunny  sunny 0.70
## 2  sunny cloudy 0.20
## 3  sunny   rain 0.10
## 4 cloudy  sunny 0.30
## 5 cloudy cloudy 0.40
## 6 cloudy   rain 0.30
## 7   rain  sunny 0.20
## 8   rain cloudy 0.45
## 9   rain   rain 0.35

classes <- sapply(mcDf,class)
classes

##        t0        t1      prob 
##  "factor"  "factor" "numeric"

As you see, we can define a markovchain object from a data.frame.

The following development shows how to yield a graph from a data.frame.

aa <- apply(mcDf, 1, 
          function(r) paste(r["t0"], "->", r["t1"], 
                            "[label =", r["prob"], "]")
          )
# aa is a vector of char. We want to obtain only one big string.
paste(aa, collapse = " ")

## [1] "sunny -> sunny [label = 0.70 ] sunny -> cloudy [label = 0.20 ] sunny -> rain [label = 0.10 ] cloudy -> sunny [label = 0.30 ] cloudy -> cloudy [label = 0.40 ] cloudy -> rain [label = 0.30 ] rain -> sunny [label = 0.20 ] rain -> cloudy [label = 0.45 ] rain -> rain [label = 0.35 ]"

bb<-
paste0("digraph {
    node [shape = box]
    ", 
    paste(aa, collapse = " "),
    "}"
  )
bb

## [1] "digraph {\n    node [shape = box]\n    sunny -> sunny [label = 0.70 ] sunny -> cloudy [label = 0.20 ] sunny -> rain [label = 0.10 ] cloudy -> sunny [label = 0.30 ] cloudy -> cloudy [label = 0.40 ] cloudy -> rain [label = 0.30 ] rain -> sunny [label = 0.20 ] rain -> cloudy [label = 0.45 ] rain -> rain [label = 0.35 ]}"

grViz(bb)

We define a function for that method:
input : 4-tuple (data, from, to, prob)
where from, to and prob are names given with ” “, each standing for a variable (a column) of data in order to define the edges of the graph, and where data is a data.frame where each line stands for an edge
output : a graph plot

library(DiagrammeR)
graphe <- function(data, from, to,prob){
  aa <- apply(data, 1, 
          function(r) paste(r[from], "->", r[to], 
                            "[label =", r[prob], "]")
          )
# aa is a vector of char. We want to obtain only one big string.
paste(aa, collapse = " ")

bb<-
paste0("digraph {
    node [shape = box]
    ", 
    paste(aa, collapse = " "),
    "}"
  )
grViz(bb)
}

graphe(mcDf,"t0","t1","prob")

If edges have zero probability, we don’t want them to appear on the graph. Remember that the graph of a Markov chain is far from having as many edges as the corresponding full graph. That’s why we define another function which fulfills this specific task of making disappear the edges with zero probability:

reduce_zero_probability_edges <- function(graph_data,from_name,to_name,probability_name){
# input: a object of type "markovchain"
# output: a plot of the corresponding graph
  
  # Create a graph
  library(DiagrammeR)
  g <- create_graph()
  g <- add_global_graph_attrs(
    g, attr = "width", value = 0.3, attr_type = "node")
  
  # Extract the required columns from the data
  from_nodes <- as.vector(graph_data[,from_name])
  to_nodes <- as.vector(graph_data[,to_name])
  probability <- as.vector(graph_data[,probability_name])
 
  # Add nodes
  for(i in 1:length(unique(c(from_nodes, to_nodes)))) {
    g <- add_node(g,
                  label = unique(c(from_nodes, to_nodes))[i])
  }
  
  # Add edges
  for(i in 1:length(from_nodes)) {
    g <- add_edge(g, 
                  from = from_nodes[i],
                  to = to_nodes[i],
                  edge_aes = edge_aes(color = ifelse(probability[i] == 0, "transparent", "black"),
                                      label = probability[i],
                                      fontcolor = ifelse(probability[i] == 0, "transparent", "black")))
  }
  # Return the graph
 render_graph(g)
}

reduce_zero_probability_edges(mcDf,"t0","t1","prob")

A bit of theory about Markov chains: the algebraic point of view

Feres R (2007). “Notes for Math 450 MATLAB Listings for Markov Chains.”

Consider a Markov chain \(X_0,\,X_1,\,X_2,\,\ldots\) , with transition probability matrix \(P\) and set of states \(S\). A state \(j\) is said to be accessible from \(i\) if for some \(n\ge 0\) the probability of going from \(i\) to \(j\) in \(n\) steps is positive, that is, \(p^{(n)}_{ij} \ge 0\). We write \(i \to j\) to represent this. If \(i \to j\) and \(j \to i\), we say that \(i\) and \(j\) communicate and denote it by \(i \leftrightarrow j\). The definition of communicating states introduces an equivalence relation on the set of states. This means, by definition, that \(\leftrightarrow\) satisfies the properties of reflexivity, symmetry and transitivity.

An equivalence relation on a set \(S\) decomposes the set into equivalence classes. If \(S\) is countable, this means that \(S\) can be partitioned into subsets \(C_1,\,C_2,\,C_3,\,\ldots\) of \(S\), such that two elements \(i\), \(j\) of \(S\) satisfy \(i \leftrightarrow j\) if and only if they belong to the same subset of the partition. If a state \(i\) belongs to \(C_u\) for some \(u\), we say that \(i\) is a representative of the equivalence class \(C_u\). For the specific equivalence relation we are considering here, we call each set \(C_u\) a communicating class of \(P\). Note, in particular, that any two communicating classes are either equal or disjoint, and their union is the whole set of states.

Closed classes and irreducible chains

A communicating class is said to be closed if no states outside of the class can be reached from any state inside it. Therefore, once the Markov chain reaches a closed communicating class, it can no longer escape it. If the single point set \(\{i\}\) comprises a closed communicating class, we say that \(i\) is an absorbing state. The Markov chain, or the stochastic matrix, are called irreducible if \(S\) consists of a single communicating class.

As a simple example, consider the stochastic matrix

\[ P=\left[ \begin{array}{cc} \frac{1}{2} & \frac{1}{2} \\ 0 & 1 \end{array} \right] \]

The set of states is \(\{1, 2\}\). The communicating class containing \(1\) is the single point set \(\{1\}\), and the communicating class containing \(2\) is \(\{2\}\). The class \(\{2\}\) is closed since \(1\) cannot be reached from \(2\), but \(\{1\}\) is not closed since there is a positive probability of leaving it. Therefore, \(2\) is an absorbing state and \(P\) (or any chain defined by the matrix \(P\)) is not irreducible.

We wish now to obtain an algorithm for finding the communicating classes of a stochastic matrix \(P\), and for determining whether not they are closed. It is convenient to use the function notation \(C(i)\) to denote the communication class containing \(i\). It follows from the definition of \(C(i)\) that it is the intersection of two sets:

\(\quad T(i)\): the set of all states in \(S\) that are accessible from \(i\), or the to-set;
\(\quad F(i)\): the set of all states in \(S\) from which \(i\) can be reached, or the from-set.

In other words, \(j\) belongs to \(T(i)\) if and only if \(i \to j\); and \(j\) belongs to \(F(i)\) if and only if \(j \to i\). Notice that the communicating class of \(i\) is the intersection of the two:

\[ C(i) = T(i) \cap F(i). \]

Moreover, the class \(C(i)\) is closed exactly when \(C(i) = T(i)\), i.e. when any state that can be arrived at from \(i\) already belongs to \(C(i)\).

Algorithm for finding \(C(i)\)

The following algorithm partitions a finite set of states \(S\) into communicating classes. Let \(m\) denote the number of elements in \(S\).

For each \(i\) in \(S\), let \(T(i) = \{i\}\);
For each \(i \in S\), do the following: for each \(k\in T(i)\), add to \(T(i)\) all states \(j\) such that \(p_{kj} > 0\). Repeat this step until the number of elements in \(T(i)\) stops growing. When there are no further elements to add, we have obtained to-sets \(T(i)\) for all the states in \(S\). A convenient way to express the set \(T(i)\) is as a row vector of length \(m\) of 0s and 1s, where the \(j\)th entry is \(1\) if \(j\) belongs to \(T(i)\) and \(0\) otherwise. Viewed this way, we have just constructed an \(m\)-by-\(m\) matrix \(T\) of 0s and 1s such that \(T(i, j) = 1\) if \(i \to j\), and \(0\) otherwise.
To obtain \(F(i)\) for all \(i\), first define the \(m\)-by-\(m\) matrix \(F\) equal to the transpose of \(T\). In other words, \(F(i, j) = T(j, i)\). Thus, the \(i\)th row of \(F\) is a vector of 0s and 1s and an entry \(1\) at position \(j\) indicates that state \(i\) can be reached from state \(j\).
Now defined \(C\) as the \(m\)-by-\(m\) matrix such that

\[ C(i, j) = T(i, j)F(i, j). \]

Notice that \(C(i, j)\) is \(1\) if \(j\) is both in the to-set and in the from-set of \(i\), and it is \(0\) otherwise.

The class \(C(i)\) is now the set of indices \(j\) for which \(C(i, j) = 1\). The class is closed exactly when \(C(i) = T(i)\).

Canonical form of \(P\)

Suppose that we have found the communicating classes of \(P\) and know which ones are closed. We can now use this information to rewrite \(P\) by re-indexing the set of states in a way that makes the general structure of the matrix more apparent. First, let \(E_1,\,\ldots E_n\) be the closed communicating classes. All the other classes are lumped together into a set \(T\) (for transient). Now re-index \(S\) so that the elements of \(E_1\) come first, followed by the elements of \(E_2\), etc. The elements of \(T\) are listed last. In particular, \(1\) now represents a state in \(E_1\) and \(m\) (the size of \(S\)) represents a state in \(T\). We still denote the resulting stochastic matrix by \(P\). Notice that \(p_{ij} = 0\) if \(i\) and \(j\) belong to different closed classes; it is also zero if \(i\) is in a closed class and \(j\) is in the transient set \(T\). Thus the matrix \(P\) takes the block form

\[ P=\left[ \begin{array}{cccccc} P_1 & 0 & 0 & \cdots & 0 & 0 \\ 0 & P_2 & 0 & \cdots & 0 & 0\\ 0 & 0 & P_3 & \cdots & 0 & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & P_n & 0\\ R_1 & R_2 & R_3 & \cdots & R_n & V\\ \end{array} \right] \]

The square block \(P_i\) defines a stochastic matrix on the set \(E_i\).

An example

P <- matrix(c(1/2,0,1/2,0,0,0,0,0,0,0,
              0,1/3,0,0,0,0,2/3,0,0,0,
              1,0,0,0,0,0,0,0,0,0,
              0,0,0,0,1,0,0,0,0,0,
              0,0,0,1/3,1/3,1/3,0,0,0,0,
              0,0,0,0,0,1,0,0,0,0,
              0,0,0,0,0,0,1/4,0,3/4,0,
              0,0,1/4,1/4,0,0,0,1/4,0,1/4,
              0,1,0,0,0,0,0,0,0,0,
              0,0,0,1/3,1/3,0,0,0,0,1/3), nrow=10, ncol=10, byrow=TRUE)

\[ P=\left[ \begin{array}{cccccccccc} \frac{1}{2} & 0 & \frac{1}{2} & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & \frac{1}{3} & 0 & 0 & 0 & 0 & \frac{2}{3} & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & \frac{1}{3} & \frac{1}{3} & \frac{1}{3} & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{4} & 0 & \frac{3}{4} & 0 \\ 0 & 0 & \frac{1}{4} & \frac{1}{4} & 0 & 0 & 0 & \frac{1}{4} & 0 & \frac{1}{4} \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & \frac{1}{3} & \frac{1}{3} & 0 & 0 & 0 & 0 & \frac{1}{3} \end{array} \right] \] We wish to find the communication classes, determine which ones are closed, and put \(P\) in canonical form.

library("markovchain")
mcP <- as(P, "markovchain")
communicatingClasses(mcP)

## [[1]]
## [1] "s1" "s3"
## 
## [[2]]
## [1] "s2" "s7" "s9"
## 
## [[3]]
## [1] "s4" "s5"
## 
## [[4]]
## [1] "s6"
## 
## [[5]]
## [1] "s8"
## 
## [[6]]
## [1] "s10"

summary(mcP)

## Unnamed Markov chain  Markov chain that is composed by: 
## Closed classes: 
## s1 s3 
## s2 s7 s9 
## s6 
## Recurrent classes: 
## {s1,s3},{s2,s7,s9},{s6}
## Transient classes: 
## {s4,s5},{s8},{s10}
## The Markov chain is not irreducible 
## The absorbing states are: s6

mcP_canonic <- canonicForm(mcP)
P_canonic <- as(mcP_canonic, "matrix")
P_canonic <- round(P_canonic,2)

\[ P_\mbox{canonic}= \begin{bmatrix} \frac{1}{2} & \frac{1}{2} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & \frac{1}{3} & \frac{2}{3} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & \frac{1}{4} & \frac{3}{4} & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & \frac{1}{3} & \frac{1}{3} & \frac{1}{3} & 0 & 0 \\ 0 & \frac{1}{4} & 0 & 0 & 0 & 0 & \frac{1}{4} & 0 & \frac{1}{4} & \frac{1}{4} \\ 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{3} & \frac{1}{3} & 0 & \frac{1}{3} \end{bmatrix} \]

Before the canonization of the Markov chain:

library("fractional")
PDf <- as(mcP,"data.frame")
v <- fractional(PDf$prob)
v <- as.character(v)
PDf$prob <- v
reduce_zero_probability_edges(PDf,"t0","t1","prob")

After the canonization of the Markov chain:

library("fractional")
PDf_canonic <- as(mcP_canonic,"data.frame")
v <- fractional(PDf_canonic$prob)
v <- as.character(v)
PDf_canonic$prob <- v
reduce_zero_probability_edges(PDf_canonic,"t0","t1","prob")

We conclude: the canonization of a Markov chain is a process that does simplify the form of the transition matrix but not its graph.

Next questions to explore

The canonization process of the transition matrix could be seen as a base change, granted we could interpret the columns of the matrix as a vector quantity. How to do that?
Consider the graph on the left-hand side of the picture below (click on it to make it bigger), representing an irreducible Markov chain. Bunching together the states as in the graph on the right-hand side we note that the set of states decomposes into three subsets that are visited in cyclic order.

Transition diagram of an irreducible Markov chain of period 3 with cyclic classes \(S_1 = \{1, 2\}, S_2 = \{4, 7\}\), and \(S_3 = \{3, 5, 6\}\).

This type of cyclic structure (possibly consisting of a single subset) is a general feature. Therefore the following question:

What is the period of a given irreducible Markov chain?

How to obtain a cyclic decomposition of a given Markov chain?

Statistics about Markov chains

The markovchain package contains functions to analyse DTMC from a probabilistic perspective. For example, the package provides methods to find stationary distributions and identifying absorbing and transient states. Many of these methods come from MATLAB listings that have been ported into R. For a full description of the underlying theory and algorithm the interested reader can overview the original MATLAB listings, Feres (2007) and Montgomery (2009).

The conditional distribution of weather states, given that current day’s weather is sunny, is given by following code.

conditionalDistribution(mcWeather, "sunny")

##  sunny cloudy   rain 
##    0.7    0.2    0.1

List of methods that can be applied on markovchain objects to perform probabilistic analysis

absorbingStates: the absorbing states of the transition matrix, if any.
steadyStates: the vector(s) of steady state(s) in matrix form.
meanFirstPassageTime: matrix or vector of mean first passage times
meanRecurrenceTime: vector of mean number of steps to return to each recurrent state
hittingProbabilities: matrix of hitting probabilities for a Markov chain
meanAbsorptionTime: expected number of steps for a transient state to be absorbed by any recurrent class
absorptionProbabilities: probabilities of transient states of being absorbed by each recurrent state
committorAB: committor probabilities
communicatingClasses: list of communicating classes
s_j , given actual state s_i
canonicForm: the transition matrix into canonic form
is.accessible: checks whether a state j is reachable from state i
is.irreducible: checks whether a DTMC is irreducible
is.regular: checks whether a DTMC is regular
period: the period of an irreducible DTMC
recurrentClasses: list of recurrent communicating classes
transientClasses: list of transient communicating classes
recurrentStates: the recurrent states of the transition matrix
transientStates: the transient states of the transition matrix, if any
summary: DTMC summary

Stationary (steady state, or equilibrium) vector

A stationary (steady state, or equilibrium) vector is a probability vector \(\pi\) such that \(\pi\cdot P=\pi\), where \(P\) stands for the transition matrix.

steadyStates(mcWeather)

##          sunny    cloudy      rain
## [1,] 0.4636364 0.3181818 0.2181818

It is possible for a Markov chain to have more than one stationary distribution, as the gambler ruin example shows.

gamblerRuinMarkovChain <- function(moneyMax, prob = 0.5) {
m <- markovchain:::zeros(moneyMax + 1)
m[1,1] <- m[moneyMax + 1,moneyMax + 1] <- 1
states <- as.character(0:moneyMax)
rownames(m) <- colnames(m) <- states

for(i in 2:moneyMax){
m[i,i-1] <- 1 - prob
m[i, i + 1] <- prob
}

new("markovchain", transitionMatrix = m,
name = paste("Gambler ruin", moneyMax, "dim", sep = " "))
}

mcGR4 <- gamblerRuinMarkovChain(moneyMax = 4, prob = 0.5)
show(mcGR4)

## Gambler ruin 4 dim 
##  A  5 - dimensional discrete Markov Chain defined by the following states: 
##  0, 1, 2, 3, 4 
##  The transition matrix  (by rows)  is defined as follows: 
##     0   1   2   3   4
## 0 1.0 0.0 0.0 0.0 0.0
## 1 0.5 0.0 0.5 0.0 0.0
## 2 0.0 0.5 0.0 0.5 0.0
## 3 0.0 0.0 0.5 0.0 0.5
## 4 0.0 0.0 0.0 0.0 1.0

steadyStates(mcGR4)

##      0 1 2 3 4
## [1,] 0 0 0 0 1
## [2,] 1 0 0 0 0

mcGR4Df <- as(mcGR4, "data.frame")

graphe(mcGR4Df,"t0","t1","prob")

reduce_zero_probability_edges(mcGR4Df,"t0","t1","prob")

Here the two steady states :

steadyStates(mcGR4)

##      0 1 2 3 4
## [1,] 0 0 0 0 1
## [2,] 1 0 0 0 0

Recurrent classes

A recurrent class is a set of states in a Markov chain in which each state can reach every other state in the set. It is a closed set of states that, once entered, cannot be left.

recurrentClasses(mcGR4)

## [[1]]
## [1] "0"
## 
## [[2]]
## [1] "4"

recurrentClasses(mcWeather)

## [[1]]
## [1] "sunny"  "cloudy" "rain"

Classification of states

absorbingStates(mcGR4)

## [1] "0" "4"

absorbingStates(mcWeather)

## character(0)

All states that pertain to a transient class are named “transient” and a specific method has been written to elicit them.

P <- markovchain:::zeros(10)
P[1, c(1, 3)] <- 1/2;
P[2, 2] <- 1/3; P[2,7] <- 2/3;
P[3, 1] <- 1;
P[4, 5] <- 1;
P[5, c(4, 5, 9)] <- 1/3;
P[6, 6] <- 1;
P[7, 7] <- 1/4; P[7,9] <- 3/4;
P[8, c(3, 4, 8, 10)] <- 1/4;
P[9, 2] <- 1;
P[10, c(2, 5, 10)] <- 1/3;
rownames(P) <- letters[1:10]
colnames(P) <- letters[1:10]
probMc <- new("markovchain", transitionMatrix = P,
name = "Probability MC")
summary(probMc)

## Probability MC  Markov chain that is composed by: 
## Closed classes: 
## a c 
## b g i 
## f 
## Recurrent classes: 
## {a,c},{b,g,i},{f}
## Transient classes: 
## {d,e},{h},{j}
## The Markov chain is not irreducible 
## The absorbing states are: f

transientStates(probMc)

## [1] "d" "e" "h" "j"

canonicForm method that turns a Markov chain into its canonic form, reordering the states to have first the recurrent classes and then the transient states.

probMcCanonic <- canonicForm(probMc)
probMc

## Probability MC 
##  A  10 - dimensional discrete Markov Chain defined by the following states: 
##  a, b, c, d, e, f, g, h, i, j 
##  The transition matrix  (by rows)  is defined as follows: 
##     a         b    c         d         e f         g    h         i         j
## a 0.5 0.0000000 0.50 0.0000000 0.0000000 0 0.0000000 0.00 0.0000000 0.0000000
## b 0.0 0.3333333 0.00 0.0000000 0.0000000 0 0.6666667 0.00 0.0000000 0.0000000
## c 1.0 0.0000000 0.00 0.0000000 0.0000000 0 0.0000000 0.00 0.0000000 0.0000000
## d 0.0 0.0000000 0.00 0.0000000 1.0000000 0 0.0000000 0.00 0.0000000 0.0000000
## e 0.0 0.0000000 0.00 0.3333333 0.3333333 0 0.0000000 0.00 0.3333333 0.0000000
## f 0.0 0.0000000 0.00 0.0000000 0.0000000 1 0.0000000 0.00 0.0000000 0.0000000
## g 0.0 0.0000000 0.00 0.0000000 0.0000000 0 0.2500000 0.00 0.7500000 0.0000000
## h 0.0 0.0000000 0.25 0.2500000 0.0000000 0 0.0000000 0.25 0.0000000 0.2500000
## i 0.0 1.0000000 0.00 0.0000000 0.0000000 0 0.0000000 0.00 0.0000000 0.0000000
## j 0.0 0.3333333 0.00 0.0000000 0.3333333 0 0.0000000 0.00 0.0000000 0.3333333

is.accessible(object = probMc, from = "a", to = "c")

## [1] TRUE

First passage time (hitting time) distributions and means

Let \(X_0,X_1,\ldots\) be a Markov chain with state space \(S\), initial probability distribution \(\pi\), and transition probabilities matrix \(P\). Define the first passage time from state \(i\) to state \(j\) as the number \(T_{ij}\) of steps taken by the chain until it arrives for the first time at state \(j\) given that \(X_0 = i\). This is a random variable with values in the set of non-negative integers. Its probability distribution function is given by

\[ h^{(n)}_{ij} = p(T_{ij} = n) = p(X_n = j,X_{n−1} \neq j,\ldots ,,X_1 \neq j \,|\, X_0 = i). \]

The first passage times can be found recursively as follows:

\(h^{(1)}_{ij} := p_{ij}\)
\(h^{(n)}_{ij} := \sum_{k\in S−\{j\}}p_{ik} \, h^{(n−1)}_{kj}\)

Let \(H^{(n)}\) denote the matrix with entries \(h^{(n)}_{ij}\) and \(H_0^{(n)}\) the same matrix except that the diagonal entries are set equal to \(0\). Then \(H^{(1)} = P\) and

\[ H^{(n)} = P\cdot H_0^{(n-1)} \] Let \(h_{ij}\) (without upper-script) be the reaching probability from state \(i\) to \(j\), i.e., the probability that state \(j\) is ever reached from state \(i\). Then

\[ h_{ij} = p(T_{ij} < \infty) = \sum_{n=1}^\infty P(T_{ij} = n) = \sum_{n=1}^\infty h^{(n)}_{ij}. \]

firstPassagePdF <- firstPassage(object = mcWeather, state = "sunny", n = 10)
firstPassagePdF[3, 3]

## [1] 0.121

State \(i\) is called recurrent if \(h_{ii} = 1\), so that starting at state \(i\) the chain with probability \(1\) eventually returns to \(i\). If \(h{ii} < 1\) state \(i\) is called transient, so there is in this case a positive probability that starting at \(i\), the chain never again returns to \(i\).

For any \(i\), define the recurrence time of state \(i\) as the random variable \(T_{ii}\). Then if state \(i\) is recurrent we have \(P(T_{ii} < 1) = 1\).

Denote the expected recurrence time to \(i\) by \(\mu_{ii} = \mathbb{E}[T_{ii}]\).

meanFirstPassageTime(mcWeather)

##           sunny   cloudy     rain
## sunny  0.000000 4.285714 6.666667
## cloudy 3.725490 0.000000 5.000000
## rain   4.117647 2.857143 0.000000

meanFirstPassageTime(mcWeather,"rain")

##    sunny   cloudy 
## 6.666667 5.000000

The meanRecurrenceTime method gives the first mean recurrence time (expected number of steps to go back to a state if it was the initial one) for each recurrent state in the transition probabilities matrix for a DTMC.

meanRecurrenceTime(mcWeather)

##    sunny   cloudy     rain 
## 2.156863 3.142857 4.583333

recurrentStates(probMc)

## [1] "a" "b" "c" "f" "g" "i"

meanRecurrenceTime(probMc)

##        f        b        g        i        a        c 
## 1.000000 2.555556 2.875000 3.833333 1.500000 3.000000

The committor probability tells us the probability to reach a given state before another given. Suppose that we start in a cloudy day, the probabilities of experiencing a rainy day before a sunny one is 0.5:

committorAB(mcWeather,3,1)

##  sunny cloudy   rain 
##    0.0    0.5    1.0

In the case of the mcWeather Markov chain we would obtain a matrix with all its elements set to 1. That makes sense (and is desirable) since if today is sunny, we expect it would be sunny again at certain point in the time, and the same with rainy weather (that way we assure good harvests):

hittingProbabilities(mcWeather)

##        sunny cloudy rain
## sunny      1      1    1
## cloudy     1      1    1
## rain       1      1    1

Simulation

Simulating a random sequence from an underlying DTMC is quite easy thanks to the function rmarkovchain. The following code generates a year of weather states according to mcWeather underlying stochastic process.

weathersOfDays <- rmarkovchain(n = 365, object = mcWeather, t0 = "sunny")
weathersOfDays[1:30]

##  [1] "sunny"  "rain"   "sunny"  "sunny"  "sunny"  "rain"   "cloudy" "cloudy"
##  [9] "rain"   "rain"   "rain"   "sunny"  "sunny"  "cloudy" "cloudy" "cloudy"
## [17] "sunny"  "cloudy" "cloudy" "sunny"  "sunny"  "sunny"  "sunny"  "sunny" 
## [25] "rain"   "rain"   "rain"   "cloudy" "sunny"  "sunny"

Estimation

A time homogeneous Markov chain can be fit from given data. Four methods have been implemented within current version of markovchain package: maximum likelihood, maximum likelihood with Laplace smoothing, Bootstrap approach, maximum a posteriori.

The MLE method

The Likelihood of a Markov chain is given by (where \(P\) stands for the transition matrix):

\[ \begin{eqnarray*} L(\pi,P) &=& p(\underline{X}_{1:T}=\underline{x}_{1:T} \,|\, \pi,P) \\ &=& p(X_1=x_1 \,|\, \pi)\cdot \Pi_{t=2}^T \, p(X_t=x_t \,|\, X_{t-1}=x_{t-1},P) \end{eqnarray*} \]

We neglect the prior for \(X_i\) considering it a constant, so we set \(\pi=\mbox{constant}\). We focus on the determination of the transition matrix \(P=[p_{ij}]_{ij}\) :

\[ L(P) \sim \prod_{t=2}^T \, p(X_1=x_1 \,|\, X_{t-1}=x_{t-1},P) \]

Let be \(N_{ij}\) the number of times the state \(i\) is followed by the state \(j\) in the chain \(X_1,X_2,\ldots,X_T\).
\(N_{ij}\) is a random variable and \(n_{ij}\) is one of its realizations:

\[ \begin{eqnarray*} L(P) &\sim& \prod_{t=2}^T \, p(X_1=x_1 \,|\, X_{t-1}=x_{t-1},P) \\ &=& \prod_{i=1}^K \prod_{j=1}^K \, p_{ij}^{n_{ij}} \end{eqnarray*} \] Notice the change: we now multiply over the squared set of possible discrete values for \(X_i\), i.e. over \(\{1,\ldots K\} \times \{1,\ldots K\}\).

In order to maximize the log likelihood \(\log(L(P))\) under the constraint that \(\forall i,\,\sum_j p_{ij}=1\) we consider the method of Lagrange’s multipliers. The objective function is:

\[ \log(L(P))-\sum_{i=1}^j \lambda_{i} \cdot \left(\sum_j p_{ij}-1\right) \]

Taking derivatives with respect to \(\lambda_{i}\) gives the \(i\)th constraint back. Taking derivatives with respect to \(p_{ij}\):

\[ \begin{align*} & & \quad & 0 = \frac{n_{ij}}{p_{ij}}-\lambda_i \\ \iff & & & p_{ij} = \frac{n_{ij}}{\lambda_i} \end{align*} \]

Substituting into the constraint equation yields:

\[ \sum_j \frac{n_{ij}}{\lambda_i}=1 \iff \sum_j n_{ij} = \lambda_i\,. \]

We obtain the ML-estimate (MLE):

\[ \hat{p}_{ij} = \frac{n_{ij}}{\sum_j n_{ij}}\,. \]

The ML-estimator itself is the random variable

\[ \hat{P}_{ij} := \frac{N_{ij}}{\sum_j N_{ij}}\,. \] \(N_{ij}\) can be written as a sum of indicator functions:

\[ N_{ij} = \sum_{t=1}^{T-1} I_{x=i}(x_t)\cdot I_{x=j}(x_{t+1})\,. \]

In order to get the variance of this estimator, and since our MLE has no analytic form, we cannot estimate the variance with the inverse of the Fisher’s information matrix.

Instead of that, and in case we only have at our disposal a given simulation of a Markov chain, we may yield a distribution for the ML estimator by help of the bootstrap method.

Of course and in our present case, we have more than only a given simulation at our disposal, because we know the transition matrix. In our case we yield a distribution for the ML estimator by help of a direct Monte Carlo method, because knowing the transition matrix allows us to simulate many other Markov chains and for each one calculate the estimator value.

Bootstrap method

Generally speaking: Let be an estimator \(Z\). We want to estimate the law of \(Z\) given the (eventually many) cumulative density function(s) \(F_k\) of the variables from which \(Z\) is formed, each \(F_k\) understood as a nuisance parameter. Because each \(F_k\) is unknown we have to estimate it. Therefore we consider:

\[ \widehat{\mathcal{L}(Z|F_k)} := \mathcal{L}(Z|\widehat{F}_k). \]

As \(\widehat{F}_k\) we consider the empirical cumulative distribution. We then get \(\mathcal{L}(T|\widehat{F}_k)\) by simulation.

A strategy in case we have at disposal only one realization of the above defined mcWeather Markov chain:

Let be \(Z:=\hat{P}_{ij}\) the estimator. \(Z\) is a function of the estimators \(N_{ij}\), which themself are functions of the estimators \(I_{x=i}\cdot I_{x=j}\). A single Markov chain simulation of length \(T\) yields only one realization per estimator \(N_{ij}\) whereas it yields \(T-1\) realizations per estimator \(I_{x=i}\cdot I_{x=j}\). That’s why we consider generating for every combination \((i,j)\) the cumulative distribution function \(F_{ij}\) of \(I_{x=i}\cdot I_{x=j}\), which takes only the two values \(0\) and \(1\).

In our case, \((i,j)\in\{\mbox{shinny,cloudy,rainy}\}\times\{\mbox{shinny,cloudy,rainy}\}\), so we have to consider 9 such cumulative distribution functions \(F_{ij}\). For each one we use the bootstrap method in order to get many samples for each of the 9 estimators \(I_{x=i}\cdot I_{x=j}\), associating them in as many 9-tuples of the form \((0,0,1,\ldots,1,0)\). Summing component wise yields estimates \(n_{ij}\) for each estimator \(N_{ij}\). In turn it yields estimates \(z\) for the estimator \(Z\). The empirical cumulative distribution formed by the \(z\) estimates the real cumulative distribution function \(F\) of \(Z\) and thus its law.

Computation of the MLE

Consider the following simulation of the mcWeather Markov chain over 365 days:

weathersOfDays <- rmarkovchain(n = 365, object = mcWeather, t0 = "sunny")
weathersOfDays[1:30]

##  [1] "sunny"  "sunny"  "sunny"  "rain"   "sunny"  "rain"   "rain"   "sunny" 
##  [9] "sunny"  "sunny"  "sunny"  "cloudy" "sunny"  "sunny"  "cloudy" "cloudy"
## [17] "cloudy" "cloudy" "sunny"  "sunny"  "sunny"  "rain"   "cloudy" "cloudy"
## [25] "cloudy" "cloudy" "cloudy" "rain"   "rain"   "sunny"

We define the indicator function \(I_{x=i}(x_t)\), where \(i\) stands for a given state and \(t\) for a given position index in the chain: in our example, \(i\in\{\mbox{sunny},\mbox{cloudy},\mbox{rainy}\}\) and \(t\in 1:365\):

I <- function(simulation_name,i,t){
      if(simulation_name[t] == i){I <- 1} else {I <- 0} 
      return(I)
}
I(weathersOfDays,"sunny",3)

## [1] 1

We then calculate the number of transitions \(n_{ij}\) as a sum of products of indicator functions:

numberOfTransitions <- function(simulation_name,i,j){
  II <- function(simulation_name,i,j,t){
        res <- I(simulation_name,i,t)*I(simulation_name,j,t+1)
        return(res)
  }
  III <- function(t){
        res <- II(simulation_name,i,j,t)
        return(res)
  }
  v <- sapply(1:364, III)
  sum <- sum(v)
  return(sum)
}
numberOfTransitions(weathersOfDays,"sunny","sunny")

## [1] 105

We verify that the whole sums to 364:

numbers <- function(simulation_name){
  v <- c("sunny","cloudy","rain")
  df <- expand.grid(v,v)
  numberOfTrans <- function(i,j){
    numberOfTransitions(simulation_name,i,j)
  }
  numbers <- mapply(numberOfTrans,df[,1],df[,2])
  numbers <- matrix(numbers,nrow=3,ncol=3)
  dimnames(numbers) <- list(v,v)
  return(numbers)
}
numbers(weathersOfDays)

##        sunny cloudy rain
## sunny    105     27   23
## cloudy    32     51   36
## rain      17     42   31

# to determine which index corresponds to raws and which to columns:
numberOfTransitions(weathersOfDays,"sunny","cloudy")

## [1] 27

numbers(weathersOfDays)["cloudy",]

##  sunny cloudy   rain 
##     32     51     36

sum(numbers(weathersOfDays))

## [1] 364

The estimator \(\hat{p}_{ij} = \frac{n_{ij}}{\sum_j n_{ij}}\) is then given by:

p_hat <- function(simulation_name,i,j){
  num <- numberOfTransitions(simulation_name,i,j)/sum(numbers(weathersOfDays)[i,])
  return(num)
}
p_hat(weathersOfDays,"sunny","rain")

## [1] 0.1483871

The markovchain package yields easily the MLE with the corresponding standard deviation:

weatherFittedMLE <- markovchainFit(data = weathersOfDays, method = "mle",name = "Weather")
weatherFittedMLE$estimate

## Weather 
##  A  3 - dimensional discrete Markov Chain defined by the following states: 
##  cloudy, rain, sunny 
##  The transition matrix  (by rows)  is defined as follows: 
##           cloudy      rain     sunny
## cloudy 0.4285714 0.3025210 0.2689076
## rain   0.4666667 0.3444444 0.1888889
## sunny  0.1741935 0.1483871 0.6774194

weatherFittedMLE$standardError

##            cloudy       rain      sunny
## cloudy 0.06001200 0.05042017 0.04753659
## rain   0.07200823 0.06186405 0.04581228
## sunny  0.03352356 0.03094085 0.06610936

Obtaining the variance with Monte Carlo

The standard deviation can be obtained, as said above, using a Monte Carlo method to obtain a distribution for the ML-estimator.

library("ProjectTemplate")

## Le chargement a nécessité le package : digest

cache("vv",{
  L <- 1000
  count <- 1
  vv <- vector("numeric",length=L)
  while(count <= L){
    simul <- rmarkovchain(n = 365, object = mcWeather, t0 = "sunny")
    simul[1:30]
    vv[count] <- p_hat(simul,"sunny","cloudy")
    count <- count+1
  }
  vv
}
)

## Le chargement a nécessité le package : formatR

##   Updating existing cache entry from CODE: vv

cache("ww",{
  L <- 1000
  count <- 1
  ww <- vector("numeric",length=L)
  while(count <= L){
    simul <- rmarkovchain(n = 365, object = mcWeather, t0 = "sunny")
    simul[1:30]
    ww[count] <- p_hat(simul,"sunny","cloudy")
    count <- count+1
  }
  ww
}
)

##   Updating existing cache entry from CODE: ww

vv <- sort(vv,decreasing=FALSE)
ww <- sort(ww,decreasing=FALSE)
set.seed(42)
ecdf1 <- ecdf(vv)
ecdf2 <- ecdf(ww)
plot(ecdf1, verticals=TRUE, do.points=FALSE)
plot(ecdf2, verticals=TRUE, do.points=FALSE, add=TRUE, col='orange')

We can access to the cumulative distribution function itself:

www <- seq(from=0.1,to=0.4,by=0.01)
www <- as(www,"vector")
ecdf1(www)

##  [1] 0.000 0.000 0.000 0.002 0.003 0.012 0.018 0.045 0.065 0.161 0.307 0.385
## [13] 0.571 0.663 0.803 0.856 0.928 0.946 0.976 0.986 0.997 0.999 0.999 1.000
## [25] 1.000 1.000 1.000 1.000 1.000 1.000 1.000

density(vv)

## 
## Call:
##  density.default(x = vv)
## 
## Data: vv (1000 obs.);    Bandwidth 'bw' = 0.006531
## 
##        x                y           
##  Min.   :0.1094   Min.   : 0.00069  
##  1st Qu.:0.1692   1st Qu.: 0.17198  
##  Median :0.2290   Median : 1.67727  
##  Mean   :0.2290   Mean   : 4.17678  
##  3rd Qu.:0.2888   3rd Qu.: 7.97679  
##  Max.   :0.3486   Max.   :14.04456

The variance for a non-negative random variable \(X\) can be expressed in terms of the cumulative distribution function \(F\) using \[ Var(X) = 2 \int_0^\infty u(1-F(u))\,du \;- \left( \int_0^\infty (1-F(u))\,du \right)^2 \]

This expression can be used to calculate the variance in situations where the cumulative distribution function, but not the density, can be conveniently expressed.

Proof of the formula:

Let be \(X\) our random variable. We have that \(Var(X)=E[X^2]-E[X]^2\).
Step 1: For every nonnegative random variable X, whether discrete or continuous,

\[ X=\int_0^X\mathrm dt=\int_0^{+\infty}\mathbf 1_{X\gt t}\,\mathrm dt=\int_0^{+\infty}\mathbf 1_{X\geqslant t}\,\mathrm dt, \]

Hence

\[ \mathrm E(X)=\int_0^{+\infty}\mathrm P(X\gt t)\,\mathrm dt=\int_0^{+\infty}\mathrm P(X\geqslant t)\,\mathrm dt. \]

Because of \(Pr[X \geq x] = 1 - F(x)\), we have showed that

\[ \mathrm E(X)=\int_0^{+\infty} \left(1 - F(x) \right) \,\mathrm dt. \] Step 2:

we calculate \(E[X^2]=\int_0^\infty x^2f(x)dx\), where \(f\) stands for the density of \(X\).

We find this expression by integration by parts. Let \(u:=x^2\) and \(dv:=f(x)dx\). Then \(du=2xdx\). We set that \(v=F(x)-1\). Then our integral is

\[ \left. x^2(1-F(x))\right|_0^\infty +\int_0^\infty 2x(1-F(x))\,dx. \]

The first part vanishes at both ends. So we find that

\[ E(X^2)=\int_0^\infty 2x(1-F(x))\,dx. \]

qed

We now implement the formula

\[ Var(X) = 2 \int_0^\infty u(1-F(u))\,du \;- \left( \int_0^\infty (1-F(u))\,du \right)^2 \]

for \(X\) defined as \(\hat{p}_\mbox{sunny, cloudy}\), and \(F\) given by the above defined function ecdf1 taking its values from 0.1 to 0.3 (see graph above).

step <- 0.01 # F takes its values from 0.1 to 0.4, see graph above or compare with the value range of the vector vv of the values for the many times simulated p_hat("sunny","cloudy").

u <- seq(from=0.1,to=0.3,by=0.01)
u <- as(www,"vector")
F <- ecdf1(u)
density1 <- u*(1-F)*step
density2 <- (1-F)*step
integral1 <- sum(density1)
integral2 <- sum(density2)
var <- 2*integral1 - integral2^2
var

## [1] 0.02418967

Thus the estimated standard deviation is:

sd <- sqrt(var)
sd

## [1] 0.1555303

Compare with the corresponding value for (“sunny”,“cloudy”):

weatherFittedMLE$standardError

##            cloudy       rain      sunny
## cloudy 0.06001200 0.05042017 0.04753659
## rain   0.07200823 0.06186405 0.04581228
## sunny  0.03352356 0.03094085 0.06610936

Getting a density function

Getting a density function for an empirical cumulative distribution allows to search for a corresponding Normal or Beta function, as another way to obtain mean and variance.

Below we define a function that do the job:

library(scam)

test <- function (x, n.knots, n.cells) {

  ## get ECDF
  n <- length(x)
  x <- sort(x)
  y <- 1:n / n
  dat <- data.frame(x = x, y = y)  ## make sure `scam` can find `x` and `y`

  ## fit a monotonically increasing P-spline for ECDF
  fit <- scam::scam(y ~ s(x, bs = "mpi", k = n.knots), data = dat,
                    weights = c(n, rep(1, n - 2), 10 * n))
  ## interior knots
  xk <- with(fit$smooth[[1]], knots[4:(length(knots) - 3)])
  ## spline values at interior knots
  yk <- predict(fit, newdata = data.frame(x = xk))
  ## reparametrization into a monotone interpolation spline
  f <- stats::splinefun(xk, yk, "hyman")

  par(mfrow = c(1, 2))

  plot(x, y, pch = 19, col = "gray")  ## ECDF
  lines(x, f(x), type = "l")          ## smoothed ECDF
  title(paste0("number of knots: ", n.knots,
               "\neffective degree of freedom: ", round(sum(fit$edf), 2)),
        cex.main = 0.8)

  xg <- seq(min(x), max(x), length = n.cells)
  plot(xg, f(xg, 1), type = "l")     ## density estimated by scam
  lines(stats::density(x), col = 2)  ## a proper density estimate by density

  ## return smooth ECDF function
  f
}

## try large sample size
f <- test(vv, n.knots = 20, n.cells = 100)

#### Shiny

library(sn)
library(ggplot2)
library(shiny)

ui <- fluidPage(
  numericInput("a", label = "alpha", min = -3, max = 3, value = 0.0),
  plotOutput("plot")
)

server <- function(input, output, session) {
  df <- reactive({
    alpha <- as.numeric(input$a)
    x <- seq(-4, 4, 0.1)
    y1 <- dsn(x, 0, 1.2, alpha)

    data.frame(x, y1)
  })


  output$plot <- renderPlot({
    ggplot(df(), aes(x)) +
      geom_line(aes(y = y1), size = 1.0, color = "black")
  })
}

shinyApp(ui, server)

Prediction

The n-step forward predictions can be obtained using the predict methods explicitly written for markovchain and markovchainList objects. The prediction is the mode of the conditional distribution of \(X_{t+1}\) given \(X_t = s_j\) , being \(s_j\) the last realization of the DTMC (homogeneous or semi-homogeneous).

The 3-days forward predictions from markovchain object can be generated as follows, assuming that the last two days were respectively “cloudy” and “sunny”.

predict(object = weatherFittedMLE$estimate, newdata = c("cloudy", "sunny"), n.ahead = 3)

## [1] "sunny" "sunny" "sunny"

Statistical Tests

In this section, we describe the statistical tests: assessing the Markov property (verifyMarkovProperty), the order (assessOrder), the stationary (assessStationarity) of a Markov chain sequence, and the divergence test for empirically estimated transition matrices (divergenceTest). Most of such tests are based on the \(\chi^2\) statistics. Relevant references are Kullback et al. (1962) and Anderson and Goodman (1957).

Kullback S, Kupperman M, Ku H (1962). “Tests for Contingency Tables and Markov Chains.” Technometrics, 4(4), 573–608

Anderson TW, Goodman LA (1957). “Statistical inference about Markov chains.” The Annals of Mathematical Statistics, pp. 89–110

Assessing the Markov property of a Markov chain sequence

sample_sequence<-c("a", "b", "a", "a", "a", "a", "b", "a", "b", "a",
"b", "a", "a", "b", "b", "b", "a")
verifyMarkovProperty(sample_sequence)

## Testing markovianity property on given data sequence
## Chi - square statistic is: 0.28 
## Degrees of freedom are: 5 
## And corresponding p-value is: 0.9980024

Assessing the order of a Markov chain sequence

Consider a second-order Markov chain. Given that an individual is in state \(i\) at \(t-2\) and in \(j\) at \(t-1\), let \(p_{ijk}(t)\) be the probability of being in state \(k\) at \(t\). When the second-order chain is stationary, \(p_{ijk}(t)=p_{ijk}\; \forall t\). A first-order stationary chain is a special second-order chain, one for which \(p_{ijk}\) does not depend on \(i\). On the other hand, a second-order chain can be represented as a more complicated first-order chain. To do this, let the pair of successive states \(i\) and \(j\) define a composite state \((i,j)\). Then the probability of the composite state \((i,j)\) at \(t-1\) is \(p_{ijk}(t)\) and the probability of state \((h,k),\;h\neq j\), given \((i,j)\), is zero. The composite states form a chain with \(m^2\) states and with certain transition probabilities \(0\). This representation is useful because results for first-order Markov chain can be carried over.

Now let \(n_{ijk}(t)\) be the number of individuals in state \(i\) at \(t-2\), in \(j\) at \(t-1\), and in \(k\) at \(t\), and let \(n_{ij}(t-1)=\sum_k n_{ijk}(t)\). We assume \(n_i(0)\) and \(n_{ij}(1)\) are nonrandom. The \(n_{ijk}(t),\quad i,j,k\in1:m;\,t\in 2:T\) is a set of sufficient statistics. The conditional distribution of \(n_{ijk}(t)\), given \(n_{ij}(t-1)\), is

\[ \frac{n_{ij}(t-1)!}{\prod n_{ijk}(t)!} \prod_{k=1}^m p_{ijk}^{n_{ijk}(t)} \] The ML estimate of \(p_{ijk}\) is (in analogy with the above detailed development for the first-order case):

\[ \hat{p}_{ijk} = n_{ijk} \bigg/ \sum_{l=1}^m n_{ijl} = \sum_{t=2}^T n_{ijk}(t) \bigg/ \sum_{t=2}^T n_{ij}(t-1). \]

Now let us consider testing the null hypothesis that the chain is first-order against the alternative that it is second-order. The null hypothesis is that \(p_{1jk}=p_{2jk},\ldots,p_{mjk}=p_{jk},\quad j,k\in1:m\). The two hypothesis concern nested models and that’s why we ought to use the likelihood ratio test for restriction testing, the second-order Markov model being a restriction of the first-order model.

The test statistic is

\[ \begin{eqnarray*} T &=& 2\log\left( \frac{\mbox{likelihood of the not restricted model }H_A}{\mbox{likelihood of the restricted model }H_0} \right) \\ &=& 2\log\left( \prod_{i,j,k=1}^m \left( \frac{\hat{p}_{ijk}}{\hat{p}_{jk}} \right)^{n_{ijk}} \right) \end{eqnarray*} \]

where

\[ \begin{eqnarray*} \hat{p}_{jk} &=& \sum_{i=1}^m n_{ijk} \bigg/ \sum_{i=1}^m\sum_{l=1}^m n_{ijl} \\ &=& \sum_{t=2}^T n_{jk}(t) \bigg/ \sum_{t=1}^{T-1} n_j(t) \end{eqnarray*} \]

is the ML estimate of \(p_{jk}\). That likelihood ratio statistic \(T\) has an asymptotic \(\chi^2\)-distribution with \(m^2(m-1)-m(m-1)=m(m-1)^2\) degrees of freedom.

The following function implements a test (not necessary the above presented likelihood ratio test) in order to test the degree of a realized sequence supposed to be generated from a Markov chain of unknown order.

data(rain)
assessOrder(rain$rain)

## Warning in assessOrder(rain$rain): The accuracy of the statistical inference
## functions has been questioned. It will be thoroughly investigated in future
## versions of the package.

## The assessOrder test statistic is:  26.09575 
## The Chi-Square d.f. are:  12 
## The p-value is:  0.01040395

Assessing the stationarity of a Markov chain sequence

The assessStationarity function assesses if the transition probabilities of the given chain change over time.

assessStationarity(rain$rain, 10)

## Warning in assessStationarity(rain$rain, 10): The accuracy of the statistical
## inference functions has been questioned. It will be thoroughly investigated in
## future versions of the package.

## Warning in chisq.test(mat): L’approximation du Chi-2 est peut-être incorrecte

## Warning in chisq.test(mat): L’approximation du Chi-2 est peut-être incorrecte

## Warning in chisq.test(mat): L’approximation du Chi-2 est peut-être incorrecte

## The assessStationarity test statistic is:  4.181815 
## The Chi-Square d.f. are:  54 
## The p-value is:  1

How to plot a graph with DiagrammeR

library(DiagrammeR)

a_graph <-
  create_graph() %>%
  add_node() %>%
  add_node() %>%
  add_edge(
    from = 1,
    to = 2)
render_graph(a_graph)

b_graph <-
  a_graph %>%
  delete_edge(
    from = 1,
    to = 2)
render_graph(b_graph)

c_graph <-
  b_graph %>%
  add_node(
    from = 1,
    to = 2)
render_graph(c_graph)

d_graph <-
  c_graph %>%
  add_node(
    type = "type_a",
    node_aes = node_aes(
      color = "steelblue",
      fillcolor = "lightblue",
      fontcolor = "gray35"),
    node_data = node_data(
      value = 2.5)) %>%
  add_edge(
    from = 1,
    to = 3,
    rel = "interacted_with",
    edge_aes = edge_aes(
      color = "red",
      arrowhead = "vee",
      tooltip = "Red Arrow"),
    edge_data = edge_data(
      value = 2.5))
render_graph(d_graph)

library(DiagrammeR)

example_graph <-
  create_graph() %>%
  add_pa_graph(
    n = 50,
    m = 1,
    set_seed = 23) %>%
  add_gnp_graph(
    n = 50,
    p = 1/100,
    set_seed = 23) %>%
  join_node_attrs(
    df = get_betweenness(.)) %>%
  join_node_attrs(
    df = get_degree_total(.)) %>%
  colorize_node_attrs(
    node_attr_from = total_degree,
    node_attr_to = fillcolor,
    palette = "Greens",
    alpha = 90) %>%
  rescale_node_attrs(
    node_attr_from = betweenness,
    to_lower_bound = 0.5,
    to_upper_bound = 1.0,
      node_attr_to = height) %>%
  select_nodes_by_id(
    nodes = get_articulation_points(.)) %>%
  set_node_attrs_ws(
    node_attr = peripheries,
    value = 2) %>%
  set_node_attrs_ws(
    node_attr = penwidth,
    value = 3) %>%
  clear_selection() %>%
  set_node_attr_to_display(
    attr = NULL)
#> `select_nodes_by_id()` INFO: created a new selection of 34 nodes
#> `clear_selection()` INFO: cleared an existing selection of 34 nodes
example_graph %>%
  render_graph(layout = "nicely")

## Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if
## `.name_repair` is omitted as of tibble 2.0.0.
## ℹ Using compatibility `.name_repair`.
## ℹ The deprecated feature was likely used in the DiagrammeR package.
##   Please report the issue at
##   <]8;;https://github.com/rich-iannone/DiagrammeR/issueshttps://github.com/rich-iannone/DiagrammeR/issues]8;;>.

Hidden Markov Chains (HMM)

A HMM consists of a discrete-time, discrete-state Markov chain, with hidden states \(z_t\in\{1,\cdots,K\}\), plus an observation model \(p(\underline{x}_t \,|\, z_t)\).

The corresponding joint distribution has the form :

\[ \begin{eqnarray*} p(\underline{z}_{1:T} \,, \underline{x}_{1:T}) &=& p(\underline{z}_{1:T})\cdot p(\underline{x}_{1:T} \,|\, \underline{z}_{1:T})\\ &=& \left[ p(z_1) \cdot \Pi_{t=2}^T p(z_t \,|\, z_{t-1}) \right] \cdot \left[ \Pi_{t=1}^T p(\underline{x}_t \,|\, z_t) \right]\,, \end{eqnarray*} \]

where we use the notation \(p(\underline{z}_{1:T}):=p(\underline{z}_1,\cdots,\underline{z}_T)\).

An important kind of inference : Filtering means to compute the belief state \(p(z_t\,|\,\underline{x}_{1:t})\) online, as the data streams in. See the forward algorithm.

Implementation of the forward algorithm

We give us a Markov chain with known transition matrix. We describe how to recursively compute the filtered marginals \(p(z_t \,|\, \underline{x}_{1:t})\) in an HMM.

The algorithm has two steps.

The prediction step:

\[ p(z_t=j \,|\, \underline{x}_{1:\,t-1}) = \sum_i p(z_t=j \,|\, z_{t-1}=i) \cdot p( z_{t-1}=i \,|\, \underline{x}_{1:\,t-1}) \] The update step:

\[ \begin{eqnarray*} \alpha_t(j) &:=& p(z_t=j \,|\, \underline{x}_{1:\,t}) \\ &=& p(z_t=j \,|\, \underline{x}_t, \, \underline{x}_{1:\,t-1}) \\ &=& \frac{1}{p(\underline{x}_t \,|\, \underline{x}_{1:\,t-1})} \cdot p(\underline{x}_t \,|\, z_t = j, \, \cancel{ \underline{x}_{1:\,t-1} } ) \cdot p(z_t = j \,|\, \underline{x}_{1:\,t-1}) \\ &=& \frac{1}{\sum_j p(z_t = j \,|\, \underline{x}_{1:\,t-1}) \cdot p(\underline{x}_t \,|\, z_t = j)} \cdot p(\underline{x}_t \,|\, z_t = j) \cdot p(z_t = j \,|\, \underline{x}_{1:\,t-1}) \end{eqnarray*} \]

The denominator of the fraction is to see as a normalization constant, which we will denote by \(Z_t\).

This process is known as the predict-upgrade cycle. The distribution \(p(z_t=j \,|\, \underline{x}_{1:\,t})\) is called the (filtered) belief state at time t, and is a vector of \(K\) numbers, often denoted by \(\underline{\alpha}_t\).

The proposed algorithm:

Discrete Time Markov Chains

2023-06-23